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Detailed Action 
Response to Amendment 

1 . In response to the office Action filed August 7, 2007, applicant has submitted an 
Amendment, filed November 8, 2007 canceling claims 2,11-13,1 6, 25-27, 31 , and 34. 
Applicant has also amended independent claims 1,15, 30, 33 and 37, reciting 
calculating an overall probability for ones of the set of languages classes by evaluating 
the probability for ones of the set of language classes by evaluating the probability for 
the document properties set based on the attribute model and the probability for the 
byte occurrences based on the text model. 

EXAMINER'S AMENDMENT 

2. An examiner's amendment to the record appears below. Authorization for this 
examiner's amendment was given in a telephone interview with Robert R. Sachs on 
January 17, 2008. 

The Independent claims 1,15, and 30 have been amended to state the following: 

1. A system for identifying language attributes through probabilistic analysis, 
comprising: 

a storage system adapted to store a set of language classes, which each 
identify a language and a character set encoding, and further adapted 
to store a plurality of training documents; 
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an attribute modeler adapted to train an attribute model by evaluating 
occurrences of one or more document properties within the training 
documents and, for each language class, calculating a probability for 
[[the]] a set of the one or more document propertieSi [[set]] the 
probability conditioned on the occurrence of the language class, the 
trained attribute model stored in the storage; wherein the document 
properties comprise at least one of top level domain, HTTP content 
character set encoding and language header parameters, and HTML 
content character set encoding and language metatags: 

a text modeler adapted to train a text model by evaluating byte occurrences 
within the training documents and, for each language class, calculating 
a probability for [[the]] a set of byte occurrences , the probability 
conditioned on the occurrence of the language class, the trained text 
model stored in the storage; and 

a training engine adapted to calculate an overall probability for [[ones]] at 
least one of the set of language classes by evaluating the probability 
for the document properties set based on the attribute model and the 
probability for the byte occurrences based on the text model. 



15. A method for identifying language attributes through probabilistic analysis, 
comprising: defining a set of language classes, which each identify a 
language and a character set encoding, and a plurality of training 
documents; evaluating occurrences of one or more document 
properties within the training documents and, for each language class, 
calculating a probability for the document properties set conditioned on 
the occurrence of the language class by an attribute model wherein the 
document properties comprise at least one of top level domain, HTTP 
content character set encoding and language header parameters, and 
HTML content character set encoding and language metatags; [[and]] 
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evaluating byte occurrences within the training documents and, for 
each language class, calculating a probability for the byte occurrences 
conditioned on the occurrence of the language class by a text 
model[[.]]; and calculating an overall probability for ones of the set of 
language classes by evaluating the probability for the document 
properties set by the attribute model and the probability for the byte 
occurrences by the text model. 

30. A system for identifying documents by language using probabilistic 
analysis of language attributes, comprising a set of language classes, 
each language class comprising a language name and a character set 
encoding name; a training corpora comprising a plurality of training 
documents; an attribute modeler adapted to train an attribute model by 
evaluating a top level domain and character set encoding associated 
with the training documents and, for each language class, calculating a 
probability for each such top level domain and character set encoding 
conditioned on the occurrence of the each language class wherein the 
document properties comprise at least one of top level domain, HTTP 
content character set encoding and language header parameters, and 
HTML content character set encoding and language metatags ; [[and]] 
a text modeler adapted to train a text model by evaluating co- 
occurrences of a plurality of bytes within the training documents and, 
for each language class, calculating a probability for the byte co- 
occurrences conditioned on the occurrence of the each language 
class[[.]]; and a training- eng. ine adapted to calculate an overall 
probability for ones of the set of language classes bv evaluating the 
probability for the top level domain and character set encoding, based 
on the attribute model and the probability for the byte occurrences 
based on the text model. 
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The original dependent claim 32 has been amended to state the following: 

32. A system according to Claim 30, further comprising: a plurality of unlabeled 
documents; and a classifier classifying one or more unlabeled documents by at 
least one language class, comprising: an attribute evaluator determining 
document properties within the documents and initializing language class 
probability to each document from the attribute model; a text evaluator evaluating 
byte occurrences in the documents and updating the language class probability 
of the each document from the text model; a pruner pruning at least one 
language class falling below a predetermined probability threshold; and an 
assignment module assigning at least one language class based on the 
language class probability of each document. 



Allowable Subject Matter 

3. Claims 1, 3-10, 14-15, 17-24, 28-30, 32-33, and 35-37 are allowed. 

4. The following is an examiner's statement of reasons for allowance: 

As to claims 1,15, 30, 33, and 37, there is no prior art reference, alone or in 
combination, that teaches or fairly suggests evaluating occurrences of one or more 
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document properties within the training documents and, for each language class, 
calculating a probability for the document properties; wherein the document properties 
comprise at least one of top level domain, HTTP content character set encoding and 
language header parameters, and HTML content character set encoding and language 
metatags. 

Bracewell et al. (US PGPUB 2006/0041685) teaches that document properties 
like HTTP header information can be used in order to identify the language of a 
document or search query on the internet (paragraph [0014]). However, none teach or 
suggest using the document information such as: top level domain, HTTP content 
character set encoding and language header parameters, and HTML content character 
set encoding and language metatags, in order to calculate the occurrences of that 
specific property and further calculating probabilities from the occurrences of the 
properties that can be directly associated to a particular language and be used as part 
of a language model in order to identify the language of the document. 

Claims 3-10, 14, 17-24, 28, 29, 32, 35, and 36 are allowed because they further 
limit their parent claims, which are allowed. 

Any comments considered necessary by applicant must be submitted no later 
than the payment of the issue fee and, to avoid processing delays, should preferably 
accompany the issue fee. Such submissions should be clearly labeled "Comments on 
Statement of Reasons for Allowance." 
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Conclusion 

A note has been made to notify the appropriate parties that the examiner has moved 
from Art Unit 2609 to 2626. 

Any inquiry concerning this communication should be directed to Josiah 
Hernandez whose telephone number is 571-270-1646. The examiner can 
normally be reached from 7:30 pm to 5:00 pm. 

If attempts to reach the examiner by telephone are unsuccessful, the 
examiner's supervisor, David Hudspeth can be reached on (571) 272-7843. The 
fax phone number for the organization where this application or proceeding is 
assigned is 703-872-9306. 

Information regarding the status of an application may be obtained from 
the Patent Application Information Retrieval (PAIR) system. Status information 
for published applications may be obtained from either Private PAIR or Public 
PAIR. Status information for unpublished applications is available through 
Private PAIR only. For more information about the PAIR system, see http://pair- 
direct.uspto.gov. Should you have questions on access to the Private PAIR 
system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll- 
free). 

JH 

/David R Hudspeth/ 

Supervisory Patent Examiner, Art Unit 2626 



